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Go-and-Back method: Effective estimation of the hidden motion of proteins 
from single-molecule time series 
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We present an effective method for estimating the motion of proteins from the motion of attached probe par- 
ticles in single-molecule experiments. The framework naturally incorporates Langevin dynamics to compute 
the most probable trajectory of the protein. By using a perturbation expansion technique, we achieve com- 
putational costs more than three orders of magnitude smaller than the conventional gradient descent method 
without loss of simplicity in the computation algorithm. Wc present illustrative applications of the method 
using simple models of single-molecule experiments and confirm that the proposed method yields reasonable 
and stable estimates of the hidden motion in a highly efficient manner. 



I. INTRODUCTION 

In single-molecule experiments of molecular motors, 
it has been a widely adopted strategy to visualize con- 
tinuous stepwise motion by attaching a large probe 
particle^— Recently, this technique has also been put 
into use for monitoring conformational changes in pro- 
teins that stochastically switch between two or more 
metastable statesj&£ Compared to using a fluorescent 
dye, using a probe particle has the following advantages: 
First, advances in technology now enable the monitoring 
of the particle at ultra-high temporal and spatial resolu- 
tions of up to 9.1 us£ and 0.1 mn^ respectively. Sec- 
ond, the particle can be manipulated under optical mi- 
croscopes, which provides insights into single- molecule 
mechanic o 3 ' 1 °~ — and energeticsi 14 ' 15 However, one prob- 
lem that remains in this method is that the observed 
motion of the probe particle does not precisely reflect 
the protein motion. For typical experiments, since the 
probe is large and loosely connected to the protein, the 
motion of the probe is usually delayed. To study protein 
dynamics in detail, the motion and the physical param- 
eters of the protein from the observed trajectory of the 
probe particle must be estimated. 

To date, there has been systematic effort in develop- 
ing both theoretical and numerical frameworks to de- 
termine a discrete state model of proteins from single- 
molecule fluorescent spectroscopy^^— The framework 
was recently combined with Bayesian statistics^ which 
allows to analyze the entire sequence of single-molecule 
data, and it was experimentally demonstrated that the 
method yields more reliable estimation than the conven- 
tional correlation analysis^ 3 - By the use of entire time se- 
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ries, a method to extract effective energy landscape has 
also been developed recently 24 ' 25 

For the time-series analysis of probe particles, there are 
several numerical approaches to estimating the underly- 
ing stepwise trajectories of the molecular motors from 
the motion of probe particles, and these have been ap- 
plied to various kinds of experiments £L22. However, to 
the best of authors' knowledge, all of these approaches 
attempt to discretize the observed trajectories, i.e., the 
probe trajectories into several discrete states. More im- 
portantly, these approaches do not incorporate the dy- 
namics of the entire system, namely, thermal fluctua- 
tions of both the protein and the probe, and the re- 
sponse delay of the probe motion. Instead, a sampling 
technique of the reaction pathway in continuous space, 
so-called transition path sampling (TPS) ) 31 ' 32 is based 
on Langevin dynamics. However, this requires significant 
computational cost to search for the dominant pathways, 
making it inefficient in the presence of multiple reaction 
pathways.— Although an effective method of estimating 
dominant reaction pathways (DRPs) has recently been 
developed,— ~— it is not straightforwardly applicable to 
the analysis of time series data because the method per- 
forms the path sampling using not constant time steps 
but constant displacement steps. 

In the present article, we consider a Langevin system 
that consists of two Brownian particles (one is visible 
and the other is hidden) connected with each other. On 
the basis of this model, we propose a method to effi- 
ciently estimate DRPs of the hidden variable from the 
trajectories of the visible variable. Although the model 
is very simple, it can be considered as a crude description 
of single-molecule experimental setups under appropri- 
ate approximations. We assume that the model and all 
the parameters have been determined and focus on the 
estimation of the DRP of the hidden variable. As will 
be shown later, even though the model is already given, 
finding the most probable trajectory of the hidden vari- 
able using the trajectory of the visible variable remains 
non-trivial. For parameter estimation in the presence of 
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hidden degrees of freedom, we have proposed a general 
framework and the practical utility was demonstrated by 
a simple Langcvin model.— In this framework, the DRP 
plays a central role in parameter estimation. Our final 
remarks in the present article are a discussion of prac- 
tical application of the proposed method to parameter 
estimation. 

Once the model is given, the path-probability can 
be expressed in terms of the Onsager-Machlup path 
probability^ - — Thus, in principle, we can apply a stan- 
dard maximum likelihood estimation to the hidden tra- 
jectory. However, we find in general that the conven- 
tional optimization algorithm requires too high a com- 
putational cost to find the DRP. Here, we develop an 
effective approximation technique with the aid of pertur- 
bation theory. The schematic procedure of our method is 
as follows. First, on the basis of a rough estimate of the 
DRP, we solve one differential equation in the forward- 
time direction. Next, by substituting this solution, we 
solve another differential equation in the reverse-time di- 
rection and obtain a better estimate of the DRP. By re- 
peating this procedure, we can systematically increase 
the accuracy of the estimate. Since we solve these differ- 
ential equations by alternating between the forward and 
reverse (backward) directions, we name this algorithm 
the Go-and-Back method. 

In Sec. II, we introduce a working model and discuss 
the validity of the model. Then, we explain the problem 
with the gradient descent method and derive the Go- 
and-Back method. In Sec. Ill, we examine two models 
of typical single-molecule experiments to investigate the 
effectiveness of our method. 



II. FRAMEWORK 
A. Model 

The molecular structure of protein typically consists 
of a few hundred of amino acids. Therefore, in general, 
we need to consider such a huge number of degrees of 
freedom to study molecular dynamics of proteins. How- 
ever, recent experimenta l 6 ' 7 ' 15 ! 40 ' 41 studies on motor pro- 
teins clarified that only a few degrees of freedom dom- 
inate the large-scale conformational changes. In partic- 
ular case of the rotational molecular motor Fi-ATPasc, 
it has been experimentally shown that the energy con- 
servation of the entire system is explained by consid- 
ering only one-dimensional (rotational) motion . 15 ' 41 In 
addition, normal mode analysis on motor proteins 4 ^— 
implies that the low-frequency modes (a few degrees of 
freedom) correspond to such large-scale motion (us-ms) , 
and the low-frequency modes are distinct from higher- 
frequency modes (a huge number of degrees of freedom) 
that may correspond to the local conformational fluctu- 
ations and the catalytic reactions at the active site (ps— 
ns). Therefore, within the typical time resolution of an 
optical microscope (ps-ms), the large number of high- 
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FIG. 1. Models for the numerical experiments. Model A: 
x(t) is stochastically switching between two local minima 
of V cff (x) by means of thermal noise (equilibrium state). 
Model B: x(t) is stochastically stepping down the tilted ef- 
fective potential V eS (x) = V(x) — fx, where V(x) is a pe- 
riodic function and / is the driving force (non-equilibrium 
steady-state). The parameters in these models are chosen 
as follows. Model A: V eS [x) is a polynomial function de- 
fined as V ciI (x) = ELi^, where oi = 2.57, a 2 = 114.4, 
a 3 = -14.0, a 4 = -191.9, a 5 = -81.3, a 6 = 196.4, a 7 = 29.1, 
and a 8 = -51.50 Model B: V(x) = Asin(2nx/l), where 
A = 20, I = 1, and / = 40. In both models, 7 = 2, T = 20, 
k = 160, k B T = 4.11, and At = 0.01. 

a In typical experiments, the bimodal distribution of the probe 
particle is best fitted by two Gaussian functions. Thus, we 
roughly fitted two Gaussian functions of different means and 
variances with an eighth-order polynomial function in order to 
reproduce experimental results. 

frequency modes arc eliminated from the dynamics of 
proteins and thus the dynamics can be approximated by 
low-dimensional overdamped Langcvin equations. 

Taking into account the above facts, we consider a 
Langevin system that consists of two Brownian particles 
interacting with each other: 

>yx = -d x [V eS (x)+U(x,y))+Z, (1) 
Ty = F(y)-d y U(x,y) + V , (2) 

where 7 and T are the friction coefficients, £(t) and 
r](t) are zero-mean white Gaussian noise with variances 
2^k*B,T and 2rksT, respectively. If x(t) and y(t) are re- 
garded as the dominant degrees of freedom of the pro- 
tein and the probe particle, respectively, Eqs. ^ and © 
can be considered as a crude model of single-molecule 
experiments! 36 ' 46 ' 47 Note that we consider the simplest 
case where both x(t) and y(t) are one-dimensional vari- 
ables because typical single-molecule experiments moni- 
tor only one-dimensional motion, but the following cal- 
culation is straightforwardly extended to higher dimen- 
sional x(t) and y(t). 

In the present model, 7 corresponds to the sum of the 
internal friction coefficient of the protein and the viscous 
friction coefficient between the protein and the medium, 
r corresponds to the viscous friction coefficient of the 
probe particle. For simplicity, we assume that 7 and T 
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are position- independent. U{x, y) corresponds to the en- 
ergy potential of the elastic linker between the protein 
and the probe particle. V eS (x) = V(x) — fx is the ef- 
fective energy potential of the protein, where V(x) cor- 
responds to the energy landscape profile of the protein 
along the reaction coordinate and / corresponds to the 
"driving force" provided by the catalytic reaction such 
as ATP hydrolysis. In actual experiments, trapping force 
or space-constant load is sometimes applied to the probe 
particle. We also incorporate such an external force into 
the model equations denoted by F(y). We assume that 
U(x,y), V eff (a;), and F{y) are independent of t. 

Here, let us discuss the validity of the working model 
by using two examples displayed in Fig. [TJ First, we 
consider that a protein has two chemical states: A and B 
states, and the protein stochastically goes back and forth 
between the two states. We suppose that the two chemi- 
cal states exhibit different conformations and we can ob- 
serve the switching motion by attaching a probe parti- 
cle. Here, if the off rates of A(substrate) and B(product) 
from the protein are slower than the switching rates be- 
tween the two states and the observation period, the 
entire system is well approximated as an equilibrium 
state. In addition, if the switching rates are slower than 
the global relaxation timescale of the protein structure, 
namely, the transition rates are well explained by the 
Kramers' model^ the entire dynamics can be modeled 
by a double- well potential of V(x) with / = and F = 



(Fig. 

Next, we suppose that a molecular motor translocates 
in one direction with regular steps by catalytic reactions. 
For example, a rotary molecular motor Fi-ATPase ro- 
tates counterclockwise with regular 120° steps by hy- 
drolyzing ATP^ In typical observation period(several 
minutes), the entire system is regarded as a noncquilib- 
rium steady-state because the concentration of ATP and 
the products (ADP and Pj) are almost constant in this 
period. If ATP molecules are abundant in the medium 
and the rate limiting reaction of each step is the global 
conformational change of the protein by its thermal fluc- 
tuation, the phenomenological model can be described 
by using a tilted periodic potential (Fig. [TJ)). 

In this manner, the dynamics of proteins and at- 
tached probe particles can be approximated as the sim- 
ple Langcvin model under appropriate conditions. Of 
course, the present model has several limitations to ap- 
ply actual experiments due to the simple approxima- 
tions especially the position-independent 7 and the time- 
independent V cS (x). The details will be discussed in 
Sec. IV. 

In what follows, we assume that y(t) is observed with 
a sufficiently high temporal and spacial resolution, while 
x(t) is hidden. We also assume that the entire set of sys- 
tem parameters is given. Then, our final task to consider 
here is the estimation of the most probable trajectory of 
x(t) from the trajectory of y(t). 



B. Path probability 

We denote the set of the trajectories from time t — to t = t as [x,y] and the entire set of system parameters as 
II = (IIi,Il2, • • ■ ,n p ). Given a value of II, we can calculate the path probability P([x, y] \U) as follows. 
First, P([x, j/]|II) is decomposed into the initial and transition probabilities as 

P([x,I/]|II) = i» in it(xo J W)|n)Ptr((a:o ) lft)) ->■ [x,v]\H), (3) 

where xq and yo denote x(0) and y(0), respectively. 

Next, the transition probability can be expressed in terms of the Onsager-Machlup path-probability ; 37 " 39 

Ptr((x ,yo) "> [xM\n)=Cexp[-pS([x,vYM ( 4 ) 
where /3 _1 = k^T and the action functional S([x, y\; II) is defined as 

S([x,y];II) = ±- r^ + Vf^ + U^ytft-^ f [Vf x (x) + U xx (x, y)} t 
^1 Jo z l Jo 

rT [Ty-F(y) + U y (x,y)] 2 t-%? f [-F y (y) + U vv (x, y)] t, (5) 

XX • 

N 



AT J L " yc " yv la " ■ 2T 
where the total and partial differentiations are denoted using the same notation such as V cS ' = V x s and d xx U = U L 



When the trajectory [x,y] is time-discretized by At, the normalization constant becomes C — [y/jT/(4irkBTAt)] N , 
where r = NAt. 

Therefore, if we adopt an appropriate approximation for the initial distribution Pi n it(£o, 2/o|II)r^ we can compute 
the path probability by Eqs. 

Once we obtain a concrete expression for the path probability, we can estimate the DRP by a standard maximum 
likelihood estimation with respect to [a;]. In Bayesian statistics, the maximum likelihood estimator (MLE) is included 
in the maximum a posteriori (MAP) estimator as a special case. It has been clarified in our previous work that the 



4 



MAP estimator does not coincide with the true trajectory of x(t)£& However, when the motion of x(t) is stepwise like 
that of molecular motors, we find that the MAP estimator seems to be a good estimator of the stepping motion of 
x(t). To maintain consistency with our previous work and also for convenience when considering the application of 
the Go-and-Back method to parameter estimation, we refer to the MLE as the MAP estimator in the present article, 
which we denote by [x] in the following sections. 



C. Gradient descent 



The gradient descent method is the most widely used optimization algorithm. Before proceeding to the Go-and-Back 
method, let us consider why this standard method is inefficient for the present problem. 

To maximize P([x, y]|II) with respect to [x], an initial condition for [x] and a boundary condition are required. 
Here, we adopt the Dirichlet boundary condition. (For the initial condition, for instance we can adopt [x] = [y].) 
In this case, xo is fixed, and thus we avoid having to consider the initial distribution Pmit(xO) j/o|n). Therefore, the 
maximization of P([x, y]|n) is replaced by the minimization of action functional S([x, y\; II) with respect to [a;]. By 
introducing a "virtual time" s and replacing x(t) with x(t, s), we obtain the MAP estimator [x] as the solution of the 
following partial differential equation in the limit of s — > oo: 



dx(t,s) = 6S([x,y];U) 
ds Sx 

dW(x, y) 



dx 



7 d 2 x(t, s) 

2^7~' (6) 



where 



W(x,y) = ±\Vf(x) + U x {x,y)] 2 - ^[V^ (x) + U xx {x,y)] 

+ ^i-F(y) + U y (x,y)f - ^f{-F y (y) + U yy (x,y)]. (7) 

Here, if we consider s in Eq. ((6]) as a "real" time t, and t in Eq. (J6j> as a spacial coordinate x, the form of Eq. ([6]) is 
similar to the time-dependent Ginzburg-Landau (TDGL) equation for non-conserved systems^ When W(x,y) takes 
the form of a multi-well function, it is known in general that, after the fast relaxation of the local fluctuation, the 
global relaxation (the kink motion) takes a very long time i 51 ' 52 Therefore, in the presence of the hopping motion of 
x(t) between multiple local minima, the gradient descent method may require a large computational cost. 

I 

D. Go-and-Back method and using Eq. @, we finally get 



As mentioned in the previous section, the gradient de- 
scent method requires a large computational cost. On the 
basis of perturbation theory and making the proper as- 
sumption that 7/r <C 1, we solve this problem as shown 
below. 

To solve the Euler-Lagrangc equation 

SS{{x,y];Tl) _ dW(x,y) 7 _ 

Sx ~ dx 2 U ' W 

we decompose the equation into two first-order ordinary 
differential equations by introducing v(t) which satisfies 

jx = -[Vf(x) + U x (x,y)}+v. (9) 

v(t) is a deterministic variable which mimics the random 
force £(t) [See Eq. ©]. Substituting Eq. © into Eq. flSJ) 



jv = [V°«(x) + U xx (x, y)]v - k B T[V°« x (x) + U xxx (x, y)\ 
+ fU xy {x, y)[Uy{x, y) - U y (x*,y)] 

+ ~U xy (x,y)-ri*, (10) 

where • represents Ito-type stochastic calculi** 3 - and x* 
represents the true position of x at time t. Since [y] is 
realized under the true [x], we must use x* instead of x 
in Eq. ©. 

Here, unknown variables x* and rf are involved in 
the third and the fourth terms in the right-hand side 
of Eq. (fit)]) . Fortunately, the contribution of these two 
terms are negligible. First, [U y (x, y) — U y (x* , y)} is statis- 
tically close to if [x] ~ [£]. Similarly, (U xy (x, y) ■ i]*) = 
0. This calculation is independent of [x]. Moreover, con- 
sidering the typical case of single-molecule experiments, 
we can naturally assume 7/r <C 1. 

Therefore, in principle, we may obtain a good approx- 
imate solution of [x] by solving the following differential 
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equations simultaneously: 

72: = -G x (x,y) + v, (11) 
jv = G xx {x,y)v - k B TG xxx {x,y), (12) 

where G(x, y) = V cti (x)+U(x, y). Here, Eq. ^ is rewrit- 
ten as Eq. (fTTj) , and Eq. (fTOj) is approximated as Eq. (|T2"T) . 

However, a difficulty remains: G xx {x,y) on the right- 
hand side of Eq. ([T2j) is positive in the typical case 
when x(t) is fluctuating around the bottom of the en- 
ergy potential. Thus, as soon as we numerically inte- 
grate Eqs. (fTT|) and (fT2|) . v(t) will destabilize and finally 
diverge. In contrast, if we try to solve these equations in 
the reverse-time direction, x(t) will be instantly destabi- 
lized due to the same problem^ 

To overcome such an initial value problem, we use that 
the second term in the right-hand side of Eq. (fT2"]) is small. 
In actual experiments, the data are discretized with At. 
This time interval is sub-millisecond for typical cases, 
which is much longer than the timescale of the local relax- 
ation of proteins (ps-ns). Within this coarse timescale, 
the rough energy landscape is effectively smoothed. In 
addition, for most times x(t) spends in the potential well. 
In this period, the second derivative of the effective po- 
tential G xx {x 1 y) becomes dominant and the higher-order 
derivatives will be negligible. Hence the second term in 
Eq. (TT2")) will be small for most times. 

We introduce a perturbation parameter £r& and 
rewrite Eq. (fT2"|) as 

jv = G xx {x,y)v - e kBTG^ix.y). (13) 

We expand x(t) and v(t) in terms of power series of e, 
and we define the i-th order approximation as 

i 

x (i)( t ) = J2 ejx u)( t )' ( 14 ) 

i 

Note that the present definition of the i-th term differs 
from the usual definition: the i-th order approximation 
includes all terms from orders to i This definition is 
crucial to dramatically simplify the algorithm. In addi- 
tion, we introduce a proper approximation for the zeroth- 
order term of v(t): 

«(„)(«) =0. (16) 

(For the reason, see Appendix A.) Then, once we adopt 
appropriate boundary conditions for x^(0) and vu\(r), 
the higher-order approximate solutions of [x] can be suc- 
cessively obtained as follows. (For details of the deriva- 
tion, see Appendix A.) 



Go-and-Back method: 

A. Using vn\(t), we solve 

7%) = -G x (x {l) ,y) + v {z) + 0{e l+1 ) (17) 

from t — to t = t and obtain x^(t). 

B. Using xu\(t), we solve 

7«(t+i) = Gxx(x(i),y)v(i+i) —e k B TG xxx {x(i),y) 

+0(e l+2 ). (18) 

from t — r to t = and obtain t>( i+1 )(i). 

C. Alternate between Step A and Step B. 



III. EXAMPLES 

To investigate the practical utility of the Go-and-Back 
method, we examine the two models illustrated in Fig. [1] 
We numerically integrate the model equations [Eqs. ([1]) 
and ((2J] from t = to t = r and obtain the true trajec- 
tory set [x, y]. Then, we assume that we can only monitor 
[y], and estimate [x] from [y\. 

Figures [2] and [3] display the estimation results of model 
A and model B by the Go-and-Back method, respectively. 
Although the estimated trajectory of x(t), denoted as [x], 
does not coincide with the true trajectory [x] , [x] seems to 
be a good estimate in both examples. In particular, step- 
wise motion of x(t) in model B is precisely reproduced 
from noisy y(t) (Fig. [3]). 

To apply the Go-and-Back method, a boundary con- 
dition is required. (The initial condition is included 
in the algorithm [Eq. (fTB"|) ]. The validity well be dis- 
cussed below.) For both examples (Figs. [2] and [3]), 
{x(j)(0), V(i)(r)} = {y(0),0} is adopted. To investigate 
the stability of the Go-and-Back method, we vary the 
boundary condition of model A at random. As a re- 
sult, effect of the boundary condition is diminished in- 
stantly (Fig. 0]). Only less than 0.1% of the total length 
of the trajectory at both ends is affected by the bound- 
ary condition. Thus, the estimates are almost uniform 
against the choice of the boundary condition. The Go- 
and-Back method allows to expect the diminishing time 
scale from Eq. (|11[) by the second order approximation 
on G(x, y) around the bottom of the effective potential. 
For the present model, the diminishing time scale is ap- 
proximated as Tdi m ~ 0.001j££ which is almost consistent 
with the numerical results (Fig. [4]). 

We also examine the validity of the initial condition 
adopted in the algorithm. We apply noise to the origi- 
nal initial condition [Eq. (fT6"|) ] and evaluate the depen- 
dency on the estimate. As a result, the original estimate 
and the estimate obtained from the noise-added initial 
condition are almost completely overlapped (Fig. 0] and 
Fig. SI). The difference is 10 -4 on average, which is 



Model A: Go-and-Back 




FIG. 2. Estimation result of model A by the Go-and-Back method, (top) The trajectory of y(t) [red line], (bottom) The true 
trajectory of x(t) [gray line], and the estimated trajectory of x(t), denoted as x(t) [blue line]. The iteration number of the 
optimization process is i — 50, and the data length is r = 100. For the present parameter setting, the relaxation time scale 
of x(t) is shorter than At. Therefore, we use linear-interpolated [y] with the step size At/10 to stably integrate Eqs. (|17|) and 

GUI- 



Model B: Go-and-Back 




time 

FIG. 3. Estimation result of model B by the Go-and-Back method, (top) The trajectory of y(t) [red line], (bottom) the true 
trajectory of x(t) [gray line] and the estimated trajectory x(t) [blue line], i = 50 and r = 50. For the present parameter setting, 
the relaxation time scale of x(t) is shorter than At. Therefore, we use linear-interpolated [y] with to stably integrate Eqs. (|17[) 
and (TTHJ). 



10 4 times smaller than the distance between two stable 
states, and 10 3 times smaller than the thermal fluctu- 
ation of x(t)£L We note that the Go-and-Back method 
bases on the Euler-Lagrangc equation [Eq. (JSJ], which 
means we cannot guarantee that the obtained estimate is 
the global minimum. However, such a robustness against 
the choice of the initial condition suggests that the esti- 
mate may be the global optimum solution at least in the 
present model. To conclude, the Go-and-Back method 
incorporates the appropriate initial condition that yields 
stable solution, and the method is quite robust against 
the boundary condition. 

Next, we compare the Go-and-Back method with the 
conventional gradient descent method. We use the same 
[y] as for the Go-and-Back method (Fig. [3J top) and 
adopt the FTCS (forward-time centered-space) scheme^ 



for the minimization of S([x,y];U). To apply FTCS 
scheme, the initial and the boundary conditions are re- 
quired. For the boundary condition, we adopt Dirichlct 
condition {xq,x t } = {j/o?2/r}, and for the initial condi- 
tion, we adopt raw data of [y], or the moving-averaged 
trajectories of [y] with different bin sizes in order to in- 
vestigate effect of the initial condition. 

Figure [S] shows the relaxation property of S([x, y];Tl) 
between the two methods. In the case of the Go-and- 
Back method, after the quick relaxation the value of 
S fluctuates slightly around the minimum value. In 
some cases, we can recognize that the method overcomes 
large barrier of S([x,y];Tl) (Figs. S2 and S3)^ For the 
present model, i = 15 is enough to converge the solution. 
Compared to the Go-and-Back method, as one expects, 
the gradient descent method requires too many iteration 
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FIG. 4. Effect of the boundary condition. The boundary 
condition {x^ (0), (r)} of model A is varied randomly (n — 
20), and the standard deviations of x(t) at each t are plotted. 
The diminishing times at both boundaries are evaluated by 
fitting exponent functions. x^{0) is varied Gaussian noise 
with mean i/(0), S.D. = 0.5 [half length of the two stable 
states]. vu\(t) is varied zero-mean Gaussian noise with the 
variance 2^yksT [the same power of £(£)]. 
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FIG. 5. Effect of the initial condition. Zero-mean white Gaus- 
sian noise with the variance 0.25 x 2^k-e,T is added to the orig- 
inal initial condition of the Go-and-Back method [Eq. (|16p ]. 
and the difference between the estimate from the noise-added 
initial condition rr n oise(£) and the the original estimate x(t) is 
evaluated at each t (10 4 data points). See also Fig. SI. 



steps for the relaxation. (The relaxation time is 10 4 times 
slower.) Moreover, the converged value of S([x, y]; II) 
in the case of the gradient descent method is strongly 
dependent on the initial condition. In the present case, 
roughly smoothed data (MA 50) yields the smallest value 
of S([x,y];U), but it is still a little bit larger than that 
of the Go-and-Back method. Such a result implies that 
the optimization processes are trapped at the local min- 
ima. Indeed, x(t) is varied among the initial conditions. 
Two examples are shown in Fig. [7] Even the best opti- 
mized solution of the gradient descent method (MA 50), 



FIG. 6. Relaxation properties of the action functional 
S([x, y]; TV) in the optimization process, (left) Go-and-Back 
method. The data is taken from the numerical experiment 
shown in Fig. [2] (model A), (right) Gradient descent method. 
We use the same [y] as for the Go-and-Back method (Fig. [2] 
top). For the initial condition, we test raw data of [y] and 
the moving-averaged (MA) trajectories of [y] with bin sizes 
20, 50, 100, and 200. (The time lengths are 0.2, 0.5, 1, and 2, 
respectively.) 

the motion of x(t) cannot be precisely estimated (Fig. 
bottom) . 

We also apply the gradient descent method to model B 
and obtain features similar to those observed for model 
A (Figs. S3 and S4). 

IV. CONCLUSION AND DISCUSSION 

We developed a method to estimate the most probable 
trajectory of the hidden variable from the trajectory of 
the probe particle. The method naturally incorporates 
Langevin dynamics. Therefore, although the difficulty of 
the model settings still remains, if we carefully choose 
the model, we may extract much more information from 
experimental data than using the conventional step anal- 
ysis. 

By use of simple models of single-molecule experi- 
ments, we numerically verify that the proposed method 
provides us reasonable estimates. Comparing to the con- 
ventional gradient descent method, our proposed method 
can successfully reduce the computational cost more than 
10 3 -fold. In the case of the gradient descent method, 
the choice of the initial conditions is crucial for obtain- 
ing accurate estimates: the wrong choice leaves the opti- 
mization process trapped at the local minima. Although 
we can adopt several techniques, for example, simulated 
annealing, to overcome this problem, case-by-case treat- 
ment is required to improve efficiency. In contrast, our 
method naturally incorporates an appropriate initial con- 
dition. Besides the simplicity of our coding, the proposed 
method only requires a boundary condition, and the re- 
sults are quite robust with respect to this choice. There- 
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Model A: gradient descent 
raw 




time 



FIG. 7. Two examples of the estimation results of model A by the gradient descent method. The true trajectories of x(t) [gray 
line], and the estimated trajectories x(t) [blue line] are shown, (top) The raw data of [y] is adopted for the initial condition, 
(bottom) A moving-averaged trajectory of [y] with bin size 50 is adopted for the initial condition. The orange arrows show the 
regions where x(t) cannot be precisely estimated. In both figures, i = 10 5 . 



fore, our method is easy to use, which is important for 
general users. 

As we mentioned above, the present model has several 
limitations to apply actual experiments. In the present 
model, it is assumed that the friction coefficient of protein 
denoted by 7 is independent of x(t). This term originates 
from the internal friction of protein and the viscous fric- 
tion between the protein and the medium. In general, 
the internal friction should be varied along the reaction 
coordinate. Indeed, position-dependent coefficients have 
been obtained from model systems and several kinds of 
proteins by simulation^ - — Therefore, although the con- 
stant approximation is not valid for all kinds of proteins, 
our assumption may be proper for the protein that has a 
large globular domain and the entire domain tilts or shifts 
in its structural transition because the internal friction is 
negligible compared to the large viscous friction. 

In addition, the present model assumes that the effec- 
tive energy potential of the protein V eS (x) is time in- 
dependent. However, recent single-molecule studies on 
cholesterol oxidase^ 4 - and /3-galactosidase^ showed direct 
evidence that the enzymatic turnover events are not a 
Markovian process but correlated with the previous his- 
tory. The result indicates that the protein has slow con- 
formation fluctuations and this time dependence should 
be appeared in V eS (x). Although, as far as we know, 
the similar slow dynamics has never been observed in 
the motor proteins, one has to verify in advance whether 
such a slow dynamics presents in the protein of interest. 
By contrast, if the substrate of the enzymatic reaction 
is not abundant in the medium so that the rate limit- 
ing step of the turnover is substrate binding, at least, 
on and off states should be incorporated into the model. 
Namely, V (x) should have two (or more) states and 
switch stochastically . 46 ' 47 ! 66 ' 67 For this case, new efficient 
estimation algorithm must be developed on the basis of 
the switching state model. 

In actual applications of our model to experiments, 
we must estimate the parameters of the entire model 



in advance. We have proposed a general framework 
of parameter estimation in the presence of hidden de- 
grees of freedom.— According to this framework, which 
is based on Bayesian statistics, we can precisely estimate 
the model parameters by maximizing the marginalized 
path probability with respect to the parameters. The 
marginalized path probability, called a marginal likeli- 
hood, is calculated by integrating all possible trajecto- 
ries of the hidden variables. In this calculation, the 
DRP, or the MAP estimator in the language of Bayesian 
statistics, plays the central role. For the analytical ap- 
proach, the Wentzel-Kramers-Brillouin (WKB) approx- 
imation around the MAP estimator may work well when 
the temperature of the system is sufficiently small . 36 ' 39 
For the numerical approach, we can utilize the MAP esti- 
mator for the initial condition of the Markov-chain Monte 
Carlo (MCMC) method 3 ^. By the use of the proposed 
method, the next step is to investigate how effectively 
we can obtain reasonable estimates of the model and the 
hidden trajectories simultaneously. 
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Appendix A: Perturbation expansion 

We expand x{t) and v(t) as 

oo 



(Al) 



(A2) 



where we denote the i-th order terms of x(t) and v(t) 
as x^(t) and v^(t), respectively, in order to distinguish 
from x (l] (t) [Eq. CLl] and v (l) (t) [Eq. (|T5])]. 
The 0-th order terms become 



7 ± (0) = -G x (x i0 \y) + v {0 \ 



(A3) 
(A4) 



G xx (x^°\ y) > in typical cases, and thus will in- 

stantly destabilize when we solve Eq. (|A4[) in the forward- 
time direction. In contrast, if we solve Eq. (|A4[) in the 
reverse-time direction, it is expected that (<) quickly 
relaxes to around 0. Therefore, we can naturally assume 
the approximation v^°'{t) = 0. Since ?/°'(t) is identical 
to U(o) (t) , we introduce the approximation Eq. (|16l) in 
the main text. 

Owing to this assumption, we can stably solve x^ ' (t) 
in the forward direction by adopting the initial condition 
on a;(°)(0). 

The first-order terms become 

7 ± (1) = -G xx (x^,y)x^+v^, (A5) 
7 i> (1) =G xx (x^,y)v^ -k B TG xxx (x^,y). (A6) 

It is noteworthy that x^(t) is not included in Eq. (|A6|) . 
and thus we can stably solve Eq. (|A6|) in the reverse 
direction. Then, by substituting wW(t) into Eq. (|A5|) . 
we obtain xW(i). 

Similarly, the second-order terms become 



7 ± (2) = --G xxx {x^\y)[x^] 2 - G xx (xW,y) X ™ 



,(2) 



(A7) 



iv (2) = G xxx (x® , y) x W V W + G xx , y)v& 
-k B TG xxxx (x i0 \y)x (1 K 



(A8) 

Again, x^ 2 ^(f) is not included in Eq. (|A8[) . Therefore, we 
obtain t>( 2 ) (t) and x^ 2 ^ (f ) by the use of the lower-order 
solutions. In this way, we can systematically calculate 
the higher-order terms. 

However, as the order number increases, the analytical 
solution becomes complicated. We further develop the 
method to simplify the algorithm. 

We introduce Eqs. (fl"4")l and (fT5|) . In these definitions, 
X(o)(t) and U(o)(i) are identical to x^°\t) and y(°\t), re- 
spectively. 

Next, combining Eqs. (|A3|) and (|A5[) leads to 

7i ( i) = -G x (x ( i), y) + v ( i) + 0(e 2 ), (A9) 



and combining Eqs. (IA4[) and (|A6[) leads to 

7« (1) = Gijtuo),!/)^) - ek B TG xxx (x {0} ,y).(AlO) 
Similarly, 

7i (3) = -G x (a: ( 2),») + u (2) + 0(e 3 ), (All) 

7*(2) = G , XK (x (1 ),y)?; (2) -efeTGxK^!),!/) 

+0(e 3 ). (A12) 

In this manner, we obtain the general forms, Eqs. (|17[) 
and (fTSl 
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